Natural Language Grammar Induction Using a Constituent-Context Model

نویسندگان

  • Dan Klein
  • Christopher D. Manning
چکیده

This paper presents a novel approach to the unsupervised learning of syntactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG models. In contrast, we employ a simpler probabilistic model over trees based directly on constituent identity and linear context, and use an EM-like iterative procedure to induce structure. This method produces much higher quality analyses, giving the best published results on the ATIS dataset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Posterior Decoding for Generative Constituent-Context Grammar Induction

In this project, we study the problem of natural language grammar induction from a database of sentence part-of-speech (POS) tags. We then present an implementation of the EM-based generative constituent-context model by Klein and Manning. We also present two posterior decoding approaches to be used in conjunction with the constituent-context model and evaluate their performance against regular...

متن کامل

Unsupervised Grammar Induction Using a Parent Based Constituent Context Model

Grammar induction is one of attractive research areas of natural language processing. Since both supervised and to some extent semi-supervised grammar induction methods require large treebanks, and for many languages, such treebanks do not currently exist, we focused our attention on unsupervised approaches. Constituent Context Model (CCM) seems to be the state of the art in unsupervised gramma...

متن کامل

A Generative Constituent-Context Model for Improved Grammar Induction

We present a generative distributional model for the unsupervised induction of natural language syntax which explicitly models constituent yields and contexts. Parameter search with EM produces higher quality analyses than previously exhibited by unsupervised systems, giving the best published unsupervised parsing results on the ATIS corpus. Experiments on Penn treebank sentences of comparable ...

متن کامل

Probabilistic Grammars and Hierarchical Dirichlet Processes

Probabilistic context-free grammars (PCFGs) have played an important role in the modeling of syntax in natural language processing and other applications, but choosing the proper model complexity is often difficult. We present a nonparametric Bayesian generalization of the PCFG based on the hierarchical Dirichlet process (HDP). In our HDP-PCFG model, the effective complexity of the grammar can ...

متن کامل

Grammar-based Classifier System: A Universal Tool for Grammatical Inference

Grammatical Inference deals with the problem of learning structural models, such as grammars, from different sort of data patterns, such as artificial languages, natural languages, biosequences, speech and so on. This article describes a new grammatical inference tool, Grammar-based Classifier System (GCS) dedicated to learn grammar from data. GCS is a new model of Learning Classifier Systems i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001